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[ Abstract. Non-Gaussian outcomes are often modeled using members 

of the so-called exponential family. Notorious members are the Bernoulli 
$3 . model for binary data, leading to logistic regression, and the Poisson 

model for count data, leading to Poisson regression. Two of the main 
reasons for extending this family are (1) the occurrence of overdisper- 
sion, meaning that the variability in the data is not adequately de- 
' scribed by the models, which often exhibit a prescribed mean-variance 

link, and (2) the accommodation of hierarchical structure in the data, 
stemming from clustering in the data which, in turn, may result from 
■ repeatedly measuring the outcome, for various members of the same 

c/3 . family, etc. The first issue is dealt with through a variety of overdisper- 

sion models, such as, for example, the beta-binomial model for grouped 
binary data and the negative-binomial model for counts. Clustering is 
£^ | often accommodated through the inclusion of random subject-specific 

' effects. Though not always, one conventionally assumes such random 

Q\ . effects to be normally distributed. While both of these phenomena may 

occur simultaneously, models combining them are uncommon. This pa- 
per proposes a broad class of generalized linear models accommodat- 
ing over dispersion and clustering through two separate sets of random 
effects. We place particular emphasis on so-called conjugate random 
effects at the level of the mean for the first aspect and normal random 
effects embedded within the linear predictor for the second aspect, even 
^ ■ though our family is more general. The binary, count and time-to-event 

cases are given particular emphasis. Apart from model formulation, we 
present an overview of estimation methods, and then settle for max- 
imum likelihood estimation with analytic-numerical integration. Im- 
plications for the derivation of marginal correlations functions are dis- 
cussed. The methodology is applied to data from a study in epileptic 
seizures, a clinical trial in toenail infection named onychomycosis and 
survival data in children with asthma. 

Key words and phrases: Bernoulli model, Beta-binomial model, Cauchy 
distribution, conjugacy maximum likelihood, frailty model, negative- 
binomial model, Poisson model, strong conjugacy, Weibull model. 
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1. INTRODUCTION 

Next to continuous outcomes, binary and bino- 
mial outcomes, counts and times to event take a 
prominent place in applied modeling and the corre- 
sponding methodological literature. It is common to 
place such models within the generalized linear mod- 
eling (GLM) framework (Nelder and Wedderburn, 
1972; McCullagh and Nelder, 1989; Agresti, 2002). 
This framework allows one to restrict specification 
to first and second moments only, on the one hand, 
or to fully formulate distributional assumptions, on 
the other hand. When the latter route is chosen, the 
exponential family (McCullagh and Nelder, 1989) 
provides an elegant and encompassing mathematical 
framework, because it has the normal, Bernoulli/bi- 
nomial, Poisson and Weibull/exponential models as 
prominent members. 

The elegance of the framework draws from certain 
linearity properties of the log-likelihood function, 
producing mathematically convenient score equations 
and ultimately convenient-in-use inferential instru- 
ments, both in terms of point and interval estima- 
tion as well as for hypothesis testing. 

Nevertheless, it has been clear for several decades, 
for binomial, count and time-to-event data, that a 
key feature of the GLM framework and many of the 
exponential family members, the so-called mean- 
variance relationship, may be overly restrictive. By 
this relationship, we indicate that the variance is a 
deterministic function of the mean. For example, for 
Bernoulli outcomes with success probability fi = ir, 
the variance is v(/j,) = n(l — it), for counts v(fj,) = \i 
and for the exponential model v{(j) = fi 2 . In con- 
trast, for continuous, normally distributed outcomes, 
the mean and variance are entirely separate param- 
eters. While i.i.d. binary data cannot contradict the 
mean-variance relationship, i.i.d. binomial data, 
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counts and survival data can. This explains why 
early work has been devoted to formulating mod- 
els that explicitly allow for over dispersion or, more 
generally, to proposing models that enjoy less re- 
strictive mean-variance relationships. For purely bi- 
nary data, hierarchies need to be present in the data 
in order to violate the mean-variance link. One such 
class of hierarchies is with repeated measures or lon- 
gitudinal data, where an outcome on a study subject 
is recorded repeatedly over time. With such models 
gaining momentum, not only for the Gaussian case 
(Verbeke and Molenberghs, 2000), but also for non- 
Gaussian data (Molenberghs and Verbeke, 2005), 
extensions of the GLM framework have been for- 
mulated. For other types of outcomes, such hierar- 
chical settings further compound the issue of overly 
restrictive variance relationships. In all cases, hier- 
archies induce association. These features taken to- 
gether call for very flexible models, doing proper jus- 
tice to each of the mean, variance and association 
structures. 

Hinde and Demetrio (1998a, 1998b) provide broad 
overviews of approaches for dealing with overdis- 
persion, considering moment-based as well as full- 
distribution avenues. Placing most emphasis on the 
binomial and Poisson settings, they pay particu- 
lar attention to random-effects-based solutions to 
the problem, including but not limited to the beta- 
binomial model (Skellam, 1948; Kleinman, 1973) for 
binary and binomial data and with beta random 
effects, and the negative-binomial model (Breslow, 
1984; Lawless, 1987), where the natural parameter 
is assumed to follow a gamma distribution. The said 
gamma distribution also features in many so-called 
frailty models, that is, specific random-effects mod- 
els for time-to-event data (Duchateau and Janssen, 
2007). On the other hand, especially focusing on hi- 
erarchical data, the so-called generalized linear mixed 
model (GLMM, Engel and Keen, 1994; Breslow and 
Clayton, 1993; Wolfinger and O'Connell, 1993) has 
gained popularity as a tool to accommodate overdis- 
persion and/or hierarchy-induced association for out- 
comes that are not necessarily of a Gaussian type, in 
spite of problems, not only of a computational type, 
but also in terms of interpretation. These arise from 
the combination of general exponential family mod- 
els with normally distributed random effects. Unlike 
for Gaussian data, the derivation of marginal mo- 
ments and joint distributions is less than straight- 
forward, even though in this paper we make progress 
beyond what is available in the literature. Part of 
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GLMMs popularity originates from the availability 
of implementations in a variety of standard soft- 
ware packages. Other solutions to accommodating 
over dispersion include mixture modeling and spe- 
cific models for zero-inflated Poisson models (Rid- 
out, Demetrio and Hinde, 1998; Bohning, 2000; 
McLachlan and Peel, 2000). 

Important unifying and computational progress 
has been made by Lee and Nelder (1996, 2001a, 
2001b, 2003) (see also Lee, Nelder and Pawitan, 
2006) by proposing so-called hierarchical general- 
ized linear models, offering a broad class of outcome 
and random-effects distributions, combined with ap- 
pealing computational schemes. Unification has also 
been reached by Skrondal and Rabe-Hesketh (2004), 
who assemble under the same roof a number of mod- 
eling strands, such as multilevel modeling, structural 
equations modeling, latent variables, latent classes 
and random-effects models for longitudinal and oth- 
erwise hierarchical data. 

In this paper we introduce a general and flexi- 
ble framework for such combinations, starting from 
arbitrary generalized linear models and exponen- 
tial family members. Specific emphasis is placed on 
normally distributed, binary, binomial, count and 
time-to-event outcomes. There are various reasons 
to do so. First, non-Gaussian hierarchical data ex- 
hibit three important features: (1) the mean struc- 
ture; (2) the variance structure; and (3) the corre- 
lation structure. Our proposed framework features: 
(a) a mean structure; (b) overdispersion, often con- 
jugate random-effects; (c) normal random effects. It 
will be clear from our case studies that model fit 
can be improved, and hence model interpretation 
changed, by shifting to the extended model. Sec- 
ond, especially in cases where the variance and/or 
correlation structures are of interest (e.g., surrogate 
marker evaluation, psychometric evaluation, etc.), 
such extensions are useful. Third, even when inter- 
est remains with more conventional models, such 
as the GLMM, the extended model can serve as 
a goodness-of-fit tool. Fourth, because we can de- 
rive closed-form expressions for both standard and 
extended models, the accuracy of parameter esti- 
mation and resulting inferences can be improved, 
while obviating the need for tedious numerical in- 
tegration techniques. Fifth, the analysis of the case 
studies corroborates this need. Such needs were rec- 
ognized by Booth et al. (2003) and Molenberghs, 
Verbeke and Demetrio (2007) who, in the context of 



count data, formulated a model combining normal 
and gamma random effects. 

The paper is organized as follows. In Section 2 
three motivating case studies, with binary data, counts 
and survival data, respectively, are described, with 
analyses reported near the end of the manuscript, in 
Section 6. Basic ingredients for our modeling frame- 
work, standard generalized linear models, extensions 
for overdispersion and the generalized linear mixed 
model, are the subject of Section 3. The proposed, 
combined model is described and further studied in 
Section 4. Avenues for parameter estimation and en- 
suing inferences are explored in Section 5. There are 
several appendices. Supplementary Material A of- 
fers generic approximations for means and variances. 
Supplementary Material B-E provide details for the 
Poisson case, the binary case with logit link and the 
binary case with probit link, and the time-to-event 
case, respectively. Implications of our findings for 
the derivation of marginal correlation functions are 
the topic of Section F in the Supplementary Mate- 
rial. 

2. CASE STUDIES 

We will describe three case studies. The first one 
producing count data, the second one with binary 
data, and the third one of a time-to-event type. 

2.1 A Clinical Trial in Epileptic Patients 

The data considered here are obtained from a ran- 
domized, double-blind, parallel group multi-center 
study for the comparison of placebo with a new anti- 
epileptic drug (AED), in combination with one or 
two other AED's. The study is described in full de- 
tail in Faught et al. (1996). The randomization of 
epilepsy patients took place after a 12-week base- 
line period that served as a stabilization period for 
the use of AED's, and during which the number of 
seizures were counted. After that period, 45 patients 
were assigned to the placebo group and 44 to the ac- 
tive (new) treatment group. Patients were then mea- 
sured weekly. Patients were followed (double-blind) 
during 16 weeks, after which they were entered into a 
long-term open-extension study. Some patients were 
followed for up to 27 weeks. The outcome of inter- 
est is the number of epileptic seizures experienced 
during the most recent week. The research question 
is whether or not the additional new treatment re- 
duces the number of epileptic seizures. 
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2.2 A Casfe Study in Onychomycosis 

These data come from a randomized, double-blind, 
parallel group, multicenter study for the comparison 
of two oral treatments (coded as A and B) for toe- 
nail dermatophyte onychomycosis (TDO), described 
in full detail by De Backer et al. (1996). TDO is a 
common toenail infection, difficult to treat, affect- 
ing more than 2 out of 100 persons (Roberts et al., 
1992). Anti-fungal compounds, classically used for 
treatment of TDO, need to be taken until the whole 
nail has grown out healthy. The development of new 
such compounds, however, has reduced the treat- 
ment duration to 3 months. The aim of the present 
study was to compare the efficacy and safety of 12 
weeks of continuous therapy with treatment A or 
with treatment B. In total, 2 x 189 patients, dis- 
tributed over 36 centers, were randomized. Subjects 
were followed during 12 weeks (3 months) of treat- 
ment and followed further, up to a total of 48 weeks 
(12 months). Measurements were taken at baseline, 
every month during treatment and every 3 months 
afterward, resulting in a maximum of 7 measure- 
ments per subject. At the first occasion, the treat- 
ing physician indicates one of the affected toenails as 
the target nail, the nail which will be followed over 
time. We will restrict our analyses to only those pa- 
tients for which the target nail was one of the two 
big toenails (146 and 148 subjects, in group A and 
group B, respectively). One of the responses of in- 
terest was the unaffected nail length, measured from 
the nail bed to the infected part of the nail, which 
is always at the free end of the nail, expressed in 
mm. This outcome has been studied extensively in 
Verbeke and Molenberghs (2000). Another impor- 
tant outcome in this study was the severity of the 
infection, coded as (not severe) or 1 (severe). The 
question of interest was whether the percentage of 
severe infections decreased over time, and whether 
that evolution was different for the two treatment 
groups. 

2.3 Recurrent Asthma Attacks in Children 

These data have been studied in Duchateau and 
Janssen (2007). Asthma is occurring more and more 
frequently in very young children (between 6 and 
24 months). Therefore, a new application of an ex- 
isting anti-allergic drug is administered to children 
who are at higher risk to develop asthma in order to 
prevent it. A prevention trial is set up with such chil- 
dren randomized to placebo or drug, and the asthma 



events that developed over time are recorded in a di- 
ary. Typically, a patient has more than one asthma 
event. The different events are thus clustered within 
a patient and ordered in time. This ordering can be 
taken into account in the model. The data are pre- 
sented in calendar time format, where the time at 
risk for a particular event is the time from the end 
of the previous event (asthma attack) to the start of 
the next event (start of the next asthma attack). A 
particular patient has different periods at risk dur- 
ing the total observation period which are separated 
either by an asthmatic event that lasts one or more 
days or by a period in which the patient was not 
under observation. The start and end of each such 
risk period is required, together with the status in- 
dicator to denote whether the end of the risk period 
corresponds to an asthma attack or not. 

3. REVIEW OF KEY INGREDIENTS 

In Section 3.1 we will first describe the conven- 
tional exponential family and generalized linear mod- 
eling based on it. Section 3.2 is devoted to a brief 
review of models for overdispersion. Section 3.3 fo- 
cuses on the normally distributed case. 

3.1 Standard Generalized Linear Models 

A random variable Y follows an exponential fam- 
ily distribution if the density is of the form 

f{y) = f{y\v,4>) 

(l) 

= exp-j> [yrj-tp(rj)] + c(y,4>)} 

for a specific set of unknown parameters r] and (f), 
and for known functions ip(-) and c(-, •). Often, 7] and 
<fi are termed "natural parameter" (or "canonical pa- 
rameter") and "dispersion parameter," respectively. 

It can easily be shown (Molenberghs and Verbeke, 
2005) that the first two moments follow from the 
function tp(-) as 

(2) E (Y) = M = V/(r?), 

(3) Var(y) = o 1 = ^"(j)). 

An important implication is that, in general, the 
mean and variance are related through a 2 = 
cj)ijj"[tp'~ 1 (fj 1 )] = (f)v(n), with v(-) the so-called vari- 
ance function, describing the mean-variance rela- 
tionship. 

Key instances of the exponential family for nor- 
mal, binary, count and time-to-event data are listed 
in Table 1, along with their exponential family el- 
ements. The normal model is special, in particular, 



Table 1: Conventional exponential family members and extensions with conjugate random effects 



Element 



Notation 



Continuous 



Binary 



Count 



Time to event 



Standard univariate exponential family 



Model 




Normal 


Bernoulli 


Poisson 


Exponential 


Model 


f(y) 


1 g-(!/-M) 2 /(2cT 2 ) 


7T V {1 -nf-V 


y\ 




Nat. param 
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ln[7r/(l - tt)] 


In A 
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Mean function 
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2 2<f> 





-In 2/! 
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4> 
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1 


1 


Mean 




/' 


7T 
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V- 1 


Variance 




a 2 


7r(l — 7r) 


A 





Exponential family with conjugate random effects 

Normal-normal 



Model 
Hier. model 
RE model 
Marg. model 



Mean 
Variance 



/(#) 
/(*) 

f(y) 

h{9) 

7 

v> 

c(j/,0) 
c* (7,-i/O 

£(y) 

Var(Y) 



1 

1 



-(y-ey/(2<T 2 ) 

; -(e- M ) 2 /(2d) 

-( ! /-M) 2 /(2( [ T 2 +d)) 



Beta-binomial 
QV(l-Qf-V 

fl°-l(l_9)3-l 



Negative-binomial 



Exponential-gamma 



-0//3 



a 2 + d\/2TT 



[tt + fi) 



S(q,/3) 

r(c») 



r(a + y) r(P+l-y) y'.r(a) V p + 



fl»r(o) 
r(a+y) / 1 'i 

a) ^ 8+1 I V 3+1 / 



1/d 

-I^-iln(f) 
.§ 7 V, 2 -§ln(f) 



ln[0/(l - fl)] 
-ln(l-0) 
1 

a + /3-2 



a+3-2 



-lnB(7?/) + 1, 

7 — ^7 + 1) 

a 
a+/3 
a/3 
(Q + /3) 2 



ln(0) 
1 

0(a-l) 

-M2/0 
(1 + 7?/)) In 7 

-lnr(l + 7V0 

a/? 
a/3(/? + l) 



s«r(a) 

(1 + I pl3y)°> + 1 



Vv 

p(a- 1) 
^(a-1)]- 1 
ln(^) 



y+<£ 



Weibull 

<py p 



ippy p 1 e 



^- 2 /"[r(2 P - 1 + i)-r(p- 1 + i) 2 ] 



Weibull-gamma 
(p9py 



-<pey p 



ippy p 1 a/3 
(l + 1 p/3 ! /P) = + 1 



-ln(0)/V 

1/V 
¥?(a- 1) 
[^(a-1)]" 1 



, ln( 7 V)-mr( , 



a[ V3 2 («-l) 2 (a-2)/3 2 ]- 1 



Y + 1 



, ln( 7 V)-lnr(2±£) 



r(a-p- 1 )r( P - 1 +i) 

[2r(a-2p- 1 )r(2p- 1 ) 



P(v« 2/ T( a } 



r( Q -p- 1 ) 2 r( P - 1 ) 2 



O 

a 

H 
tr 1 



^ 
O 

> 
tc 

> 
2i 
O 

o 
o 

t-H 

CI 

o 
r? 

H 

> 
2| 
O 
O 



O 
H 



6 



MOLENBERGHS, VERBEKE, DEMETRIO AND VIEIRA 



also because the over dispersion parameter is needed 
to allow for a variance other than unity. As a re- 
sult, the mean-variance relationship is absent for 
this model, but present for all others. In the binary 
case, an alternative to the Bernoulli model with logit 
link is the probit model, where rj = <J> -1 (7r) and $(•) 
is the standard normal cumulative distribution func- 
tion. Evidently, this model is slightly less standard 
because the probit model is not the natural link, 
as we will see in Section 4.6, it has appeal in the 
overdispersed and/or repeated contexts. 

In the Weibull and exponential model, the decom- 
position <f = Ae^ is often employed, with notation as 
in Table 1, allowing for \i to become a function of 
covariates. Evidently, here, while fi is a component 
of the mean function, it is in itself not equal to the 
mean. Note also that the Weibull model does not 
belong to the exponential family in a conventional 
sense, unless in a somewhat contrived fashion where 
y is replaced by y p . In the mean and variance ex- 
pressions for the Weibull (Table 1), T(-) represents 
the gamma function. 

In some situations, for example, when quasi-likeli- 
hood methods are employed (McCullagh and Nelder, 
1989; Molenberghs and Verbeke, 2005), no full dis- 
tributional assumptions are made, but one rather 
restricts to specifying the first two moments (2) and 
(3). In such an instance, the variance function v(h) 
can be chosen in accordance with a particular mem- 
ber of the exponential family. If not, then param- 
eters cannot be estimated using maximum likeli- 
hood principles. Instead, a set of estimating equa- 
tions needs to be specified, the solution of which is 
referred to as the quasi-likelihood estimates. 

In a regression context, where one wishes to ex- 
plain variability between outcome values based on 
measured covariate values, the model needs to incor- 
porate covariates. This leads to so-called generalized 
linear models. Let Y\, . . . , Yv be a set of independent 
outcomes, and let xi,...,Xjv represent the corre- 
sponding p-dimensional vectors of covariate values. 
It is assumed that all Y have densities f(yi\rji,4>), 
which belong to the exponential family, but a differ- 
ent natural parameter rji is allowed per observation. 
Specification of the generalized linear model is com- 
pleted by modeling the means Hi as functions of the 
covariate values. More specifically, it is assumed that 
Hi = h(rji) = /i(x^), for a known function h(-), and 
with £ a vector of p fixed, unknown regression co- 
efficients. Usually, h~ l (-) is called the link function. 



In most applications, the so-called natural link func- 
tion is used, that is, h(-) = ip'(-), which is equivalent 
to assuming rji = x^£. Hence, it is assumed that the 
natural parameter satisfies a linear regression model. 

3.2 Overdispersion Models 

It is clear from Table 1 that the standard Bernoulli, 
Poisson and exponential models force the mean and 
variance functions to depend on a single parame- 
ter. However, comparing the sample average with 
the sample variance might already reveal in certain 
applications that this assumption is not in line with 
a particular set of data, for count and time-to-event 
data, for example. While this is one of the senses in 
which the binary case is somewhat exceptional, be- 
cause a set of i.i.d. Bernoulli data cannot contradict 
the mean-variance relationship, it would still hold 
for the related binomial case, where the data take 
the form of rij successes out of Zi trials. 

Therefore, a number of extensions have been pro- 
posed, as briefly mentioned in the Introduction. Hinde 
and Demetrio (1998a, 1998b) provide general treat- 
ments of overdispersion. The Poisson case received 
particular attention by Breslow (1984) and Law- 
less (1987). Molenberghs and Verbeke (2005) men- 
tion various model-based approaches that accom- 
modate overdispersion, including the beta-binomial 
model (Skellam, 1948), the Bahadur model (1961), 
the multivariate probit model (Dale, 1986; Molen- 
berghs and Lesaffre, 1994) and certain versions of 
the generalized linear mixed model (Breslow and 
Clayton, 1993). The latter family will be studied in 
Section 3.3. 

A straightforward and commonly encountered step 
is to allow the overdispersion parameter <j) 7^ 1, so 
that (3) produces Var(Y) = 4>v(h)- This is in line 
with the moment-based approach mentioned in the 
previous section, but can also be engendered by fully 
parametric assumptions. 

An elegant way forward is through a two-stage 
approach. For binary data, one would assume that 
Yi\iTi ~ Bernoulli^) and further that iTi is a random 
variable with E(7Tj) = Hi and Var(7Tj) = af. Using 
iterated expectations, it follows that 

E(Y i ) = E[E(Y i \7r i )]=E(TT i ) = fH , 
Var(Y) = E\yai(Yi\iri)] + Var^Y*^)] 
= ^[7ri(l-7Ti)]+Var( 7 r i ) 
= E{it i )-E[rf) + E{i$)-E{it i ) 2 
= /ii(l -/J,i), 
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underscoring that purely Bernoulli data are unable 
to capture overdispersion. 

Likewise, for the Poisson case, we assume that 
Yi\Q ~ Poi(Ci) and then that Q is a random variable 
with E(£i) = ^ and Var(£i) = af. Also here then, it 
follows that 

E(Yl) = E[E(Yl\Ci)]=E(Ci) = m, 
Var(y) = ^[Var^lCi)] + Var[E(y i |Ci)] 
= E(Q + Var(( i )= f i i + al 

Note that we have not assumed a particular distribu- 
tional form for the random effects 7Tj and Q, respec- 
tively. Hence, this gives rise to a semi-parametric 
specification. Similar routes can be followed for other 
GLM, too. 

In case it is considered advantageous to make full 
distributional assumptions about the random effects, 
common choices are the beta distribution for 7Tj and 
the gamma distribution for Q; of course, these are 
not the only ones. 

Generally, the two-stage approach is made up of 
considering a distribution for the outcome, given 
a random effect f(yi\9i) which, combined with a 
model for the random effect, f(8i), produces the 
marginal model: 

(4) f{ Vi )= j fivmfmMi. 

It is easy to extend this model to the case of 
repeated measurements. We then assume a hierar- 
chical data structure, where now Yij denotes the 
jth outcome measured for cluster (subject) i, i = 
1, . . . , N, j = 1, . . . , Hi and Yj is the n^-dimensional 
vector of all measurements available for cluster i. In 
the repeated-measures case, the scalar Q becomes 
a vector Ci = (Cil, ■ ■ ■ , CinJ', with E(d) = and 
Var(£J = Sj. For example, for the Poisson case, 
similar logic as in the univariate case produces 
E(Yi) = m and Var(Yj) = Mj + E i; where Mj is 
a diagonal matrix with the vector along the di- 
agonal. Note that a diagonal structure of Mj re- 
flects the conditional independence assumption: all 
dependence between measurements on the same unit 
stems from the random effects. Generally, a versa- 
tile class of models results. For example, assum- 
ing that the components of £j are independent, a 
pure overdispersion model follows, without correla- 
tion between the repeated measures. On the other 
hand, assuming Qj = that is, that all components 
are equal, then Var(Yj) = Mj + afJ ni , where J ni 



is an (rii x rij )-dimensional matrix of ones. Such a 
structure can be seen as a general version of com- 
pound symmetry. Of course, one can also combine 
general correlation structures between the compo- 
nents of Ci- 

Alternatively, this repeated version of the overdis- 
persion model can be combined with normal ran- 
dom effects in the linear predictor. This very specific 
choice, proposed also by Thall and Vail (1990) and 
Dean (1991), for the count case, will be the focus of 
the next section. 

General marginalization (4) may seem an elegant 
and general principle, there is the issue of having to 
decide which parameter to turn into a random one. 
This is especially true if one considers the need to 
select an actual distributional form for the random 
effect. A noteworthy exception is, as always, the lin- 
ear mixed model, combining a normal hierarchical 
model with a normal random effect. It forms the 
basis of the two strands of random-effects models 
that are potentially brought together in the com- 
bined models of Section 4: on the one hand, nor- 
mal random effects can be considered with nonnor- 
mal outcomes, producing the GLMM; on the other 
hand, gamma random effects for the Poisson model, 
beta random effects with binomial data and gamma 
random effects for the Weibull model can be con- 
sidered. This is, seemingly, a disparate collection. 
However, they are bound together by the property 
of conjugacy, in the sense of Cox and Hinkley (1974), 
page 370, and Lee, Nelder and Pawitan (2006), 
page 178. The topic is also discussed by Agresti 
(2002). Informally, conjugacy refers to the fact that 
the hierarchical and random-effects densities have 
similar algebraic forms. Conjugate distributions pro- 
duce a general and closed-form solution for the cor- 
responding marginal distribution. 

We will first define conjugacy as is conventionally 
done, that is, in models without the normal random 
effects and then, in Section 4, introduce a further 
property, strong conjugacy, necessary for situations 
where both normal and conventional conjugate ran- 
dom effects are present. To simplify notation, we will 
provide the definition at a general distribution level, 
with neither subject- nor measurement-specific sub- 
scripts, so that it can be applied to both univariate 
and longitudinal data. The hierarchical and random- 
effects densities are said to be conjugate if and only 
if they can be written in the generic forms 

(5) f(y\9) = exp{ ( f>- 1 [yh(9) - g(9)} +c(y, </>)}, 

(6) f(6) = exp{ 7 [#(0) - g(9)} + c*( 7 , V)}, 
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where g(0) and h{6) are functions, <fi, 7 and ip are 
parameters, and the additional functions c(y, 4>) and 
c *(7)^) are so-called normalizing constants. It can 
then be shown, upon constructing the joint distri- 
bution and then integrating over the random effect, 
that the marginal model resulting from (5) and (6) 
equals 



f{y) = exp 



(7) 



c(y,(t>) +c*(7,^) 



■7, 



-i + 7 



Table 1 gives model elements, such as density or 
probability mass functions, conditional on random 
effects and marginalized over these, as well as the 
random effects distributions. For all models consid- 
ered, the constants and functions featuring in (5)— 
(6) are listed, and finally marginal means and vari- 
ances are provided. For some models, these are well 
known (Hinde and Demetrio, 1998a, 1998b) and/or 
easy to derive. For the time-to-event models, a sketch 
can be found in Appendix E. While there, the fo- 
cus is on the combined version of Section 4.8, the 
over dispersion case considered here follows as a spe- 
cial case. 

In the case of binary data, the model in Table 1 
is the familiar beta-binomial model. Note that the 
variance still obeys the usual Bernoulli variance struc- 
ture. This is entirely natural, given that we still fo- 
cus on a single binary outcome, in contrast to the 
more conventional binomial basis model, where data 
of the format 11 z% successes out rtj trials" are con- 
sidered. We do not consider this situation in this 
section, but rather leave it to Section 4. In such a 
case, the variance structure becomes 7r,(l — vr^) [1 + 
Pi{rii — 1)], where pi is a measure for correlation. 
All parameters, pi and pi, can be expressed in terms 
of OLi and fy, "cluster-specific" versions of the beta 
parameters. 

For count data, the familiar negative-binomial 
model results. Unlike in the binary case, univari- 
ate counts are able to violate the mean-variance 
relationship of the Poisson distribution, hence the 
great popularity of this and other types of models for 
overdispersion. The same applies to the exponential 
distribution. Of course, already the Weibull model, 
with its extra parameter p, alleviates the constraint. 

The normal distribution case is a special one. Not 
only is it self-conjugate, also the model is not iden- 
tified, unlike all others. This is because both ran- 
dom terms, seen from writing Y{ = p j + b{ + e % , are 



in direct, linear relationship with each other. In the 
generalized linear context, the various random terms 
have no direct linear alliance. The normal case will 
continue to be "the odd one out" in models to come 
(Sections 3.3 and 4). 

The parameters a and j3 in the beta and gamma 
distributions are not always jointly identified. It is 
therefore customary to impose restrictions, such as 
setting one of them equal to a fixed value, for exam- 
ple, a = 1, or constraining their mean or variance, 
etc. Such constraints operate differently, depending 
on other elements present in the models. For exam- 
ple, the presence of additional random effects in a 
model for repeated measures, such as in Section 4, 
alters the meaning and restrictiveness of such con- 
straints. 

Recall that the models at the bottom part of Ta- 
ble 1 are not the only options, but rather common, 
elegant choices, where the elegance draws to a large 
extent from conjugacy. 

3.3 Models with Normal Random Effects 

The generalized linear mixed model (Engel and 
Keen, 1994; Breslow and Clayton, 1993; Wolfinger 
and O'Connell, 1993) is likely the most frequently 
used random-effects model in the context of perhaps 
non-Gaussian repeated measurements. Not only is it 
a relatively straightforward extension of the general- 
ized linear model for independent data (Section 3.1) 
to the context of hierarchically organized data, on 
the one hand, and the linear mixed model (Verbeke 
and Molenberghs, 2000), on the other hand, but 
there is also a wide range of software tools available 
for fitting such models. 

Let Yij be the jth outcome measured for cluster 
(subject) i = 1, . . . , N, j = 1, . . . , re, and group the 
m measurements into a vector Yj. Assume that, 
in analogy with Section 3.1, conditionally upon q- 
dimensional random effects bj ~ N(0,D), the out- 
comes are independent with densities of the form 



(8) 



/i(2/ij|bi,£,i 



with 



(9) 



exp{(/> 1 [y i jXi j -ip{\ij)] +c(yij,(j))}, 



r}W{hj)]= vM = r)[E(Yij\bi,Z)] 



for a known link function rj(-), with Xj,- and Zjj 
p-dimensional and (/-dimensional vectors of known 
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covariate values, with £ a p-dimensional vector of 
unknown fixed regression coefficients, and with (j> a 
scale (overdispersion) parameter. Finally, let f(hi\D) 
be the density of the N(0,D) distribution for the 
random effects bj. 

These models closely follow the ones formulated 
in the top part of Table 1, with key differences that 
now: (1) data hierarchies are allowed for, in our set- 
ting owing to the longitudinal collection of data; (2) 
the natural parameter is written as a linear predic- 
tor, a function of both fixed and random effects. 

Obviously, such models can be formulated for all 
data settings considered in Table 1 and beyond. This 
is conventionally done for continuous, Gaussian data, 
producing the linear mixed-effects model (Verbeke 
and Molenberghs, 2000), as well as for binary data 
and counts. This kind of model is a bit less com- 
mon for survival data, where so-called frailty mod- 
els (Duchateau and Janssen, 2007), rather of the 
type described in Section 3.2, are more standard. 
Of course, also the accelerated failure time model 
with random effects deserves mention, given that it 
takes the form of a linear mixed model for logarith- 
mic time. 

We will not consider explicit expressions for such 
models here, because they are relatively well stud- 
ied (Fahrmeir and Tutz, 2001; Molenberghs and Ver- 
beke, 2005) and, at any rate, conveniently follow as 
special cases from the combined models of Section 4. 

4. MODELS COMBINING CONJUGATE AND 
NORMAL RANDOM EFFECTS 

4.1 General Model Formulation 

Integrating both the overdispersion effects of Ta- 
ble 1 (Section 3.2) as well as the normal random ef- 
fects of Section 3.3 into the generalized linear model 
framework produces the following general family: 

fi(yij\^i,^,dij,^) 

(10) 

= exp{(/> 1 [y ij Xij - ip(Xij)} + c(yij,4>)}, 

with notation similar to the one used in (8), but now 
with conditional mean 

(11) E(Y ij \b i ,^9 ij )= f i c ij = e ij K ij , 

where the random variable 6ij ~ Qij , afj) , = 
g{x'ij£ + Z'jbj), •dij is the mean of and o\- is 
the corresponding variance. Finally, as before, bj ~ 
N(0,D). Write = + z^bj. Unlike in Sec- 
tion 3.3, we now have two different notations, rjij 



and Ajj, to refer to the linear predictor and/or the 
natural parameter. The reason is that Xij encom- 
passes the random variables dij, whereas r]ij refers 
to the "GLMM part" only. 

It is convenient, but not strictly necessary, to as- 
sume that the two sets of random effects, Oi and 
bj, are independent of each other. Regarding the 
components 6{j of 6i, three useful special cases re- 
sult from assuming that: (1) they are independent; 
(2) they are correlated, implying that the collection 
of univariate distributions Qijil&ij-,^) needs to be 
replaced with a multivariate one; and (3) they are 
equal to each other, useful in applications with ex- 
changeable outcomes Yij. 

Obviously, parameterization (11) allows for ran- 
dom effects 6ij capturing overdispersion, and for- 
mulated directly at mean scale, such as described 
in Section 3.2, whereas rey could be considered the 
GLMM component, as in Section 3.3. The relation- 
ship between mean and natural parameter now is 

(12) Xij = fc(jt?.) = h(0ijKij). 

We can still apply standard GLM ideas, in partic- 
ular, (2) and (3), to derive the mean and variance, 
combined with iterated-expectation-based calcula- 
tions. For the mean, it follows that 

(13) EiYij) = E{e i3 )E(K l3 ) = Elh-^Xij)]. 

4.2 Generic Approximations for Marginal Model 
Elements 

As we will see in ensuing specific cases (Sections 4.4- 
4.8), (13) allows for explicit expressions in a good 
number of cases. Generic mean, variance and co- 
variance approximations can be derived using the 
expansion, around bj = 0, 

Kij^g(vij)+g'(vij) z 'ij h i + y'(vij) z 'ij h M z ij- 

Details and expressions are provided in Appendix A. 

4.3 Strong Conjugacy 

In Section 3.2 the concept of conjugacy was in- 
troduced and exemplified in a number of cases (see 
Table 1). It is of interest to explore under what con- 
ditions Model (10) still allows for conjugacy. The 
complication is the presence of the multiplicative 
factor Kij in the mean structure. To make progress, 
we will study how conjugacy plays out between 
Model (10) and the distribution of the random ef- 
fect Oij, given the multiplicative factor k^. In other 
words, conjugacy will be considered conditional upon 
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the normally-distributed random effect bj. To this 
effect, write (suppressing nonessential arguments 
from the functions) 



(14) 



f(y\K0)=exp{<l>- 1 [yh(K0)-g(KB)] 

+ c(y, </>)}, 



generalizing (5), and retain (6). Applying the trans- 
formation theorem to (6) leads to 

f(0\ 1 ,^)=K-f(K9\j,$). 

Next, we request that the parametric form (6) be 
maintained: 



(15) 



f( K 6)=exp{j*[<P*h(K9)-g(K9)} 



where the parameters 7* and ip* follow from 7 and 
ip upon absorption of k. Then, the marginal model, 
in analogy with (7), equals 



/(y| K )=exp<j C (y,0) + C **( 7 *,V* 
(16) 



+ c \<j> +7 ,- 



-i + 7 * 

Evidently, not every model satisfying conjugacy in 
the sense of Section 3.2 will allow for the present 
form of conjugacy. We will refer to this condition 
as strong conjugacy. Examples include the normal, 
Poisson and Weibull (and hence exponential) models 
with normal, gamma and gamma random effects, 
respectively. A counterexample is provided by the 
Bernoulli, and hence also binomial, model. Because 
the probit model does not allow for conjugacy, not 
even in the usual sense, it is out of the picture here, 
too. The latter does not preclude the existence of 
closed forms in the probit case, as we will see in 
Section 4.7. 

Note that the transition from strong conjugacy is 
a property entirely of the random-effects distribu- 
tion, and not of the data model, the latter of which 
is needed, of course, for conjugacy itself. For exam- 
ple, for gamma random effects, we can write 



i/(0| a ,/3) = i— * , 
k k p a T(a) 



\a-i e -e/p 



(17) 



KB) a Ta) K ' 



(Kp) a T(a) 
f(K(3\a,n/3) 



and, hence, a scaled version of a gamma random 
effect is still a gamma random effect, with retention 
of a and rescaling of j3 to nj3. 

The importance of strong conjugacy lies, among 
others, in the easy integration over the nonnormal 
random effects 9ij. As a consequence, the resulting 
density is conditional on k and hence on bj only, im- 
plying that standard software for generalized linear 
or nonlinear mixed-effects models, such as the SAS 
procedure NLMIXED, can be employed, a point to 
which we will return in Section 5. 

We will now consider the normal, Poisson, binary 
and time-to-event cases in turn. Details of the cal- 
culations for the Poisson case are given in Molen- 
berghs, Verbeke and Demetrio (2007) and summa- 
rized in Appendix B, while the binary and time-to- 
event cases are supported by Appendices C, D and 
E, respectively. 

There is no need to spell out the various models in 
detail. The different versions of (10) follow straight- 
forwardly upon combining the models formulated in 
Table 1 with the GLMM (8) and corresponding lin- 
ear predictor (9). Precisely, the effect 9 ought to 
be replaced by where Kij is defined by set- 

ting 77 = rjij equal to the linear predictor whence Kij 
is expressed, for the respective models, as //, 7r, A 
and cj). 

4.4 Specific Case: Continuous, Normally 
Distributed Data 

The fully hierarchically specified linear mixed-ef- 
fects model takes the form (Verbeke and Molen- 
berghs, 2000) 

(18) Y i |b i ~JV(^ + Z i b i ,E i ), 

(19) b i ~JV(0,D), 

where £ is a vector of fixed effects, and and Z{ are 
design matrices. The rows of Aj£ + Zjbi are made 
up by the linear predictors (9). 

Based upon (18) and (19), the marginal model can 
be derived: 



(20) 



Y i ~N(X i £,V i = Z i DZ' i + i; i 



We evidently consider a single set of random ef- 
fects only, because, in this case, the normal and con- 
jugate random effects coincide, a unique feature of 
the normal model. Strong conjugacy is a fortiori ev- 
ident. 
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4.5 Specific Case: Poisson-Type Models for 
Count Data 

From the general developments above, the Pois- 
son model with gamma and normal random effects 
combined naturally follows. By way of overview, let 
us assemble all model elements: 



(21) Yij ~Poi(%^), 

(22) Kij = exp(x-^ + z-jbj), 

(23) bi~JV(0,I>), 

(24) E{O i )=E[{e il ,...An i )']-- 

(25) Var(0 4 ) = 



This model has the same structure of the one by 
Booth et al. (2003). In the spirit of Table 1, the 6^ 
can be assumed to follow a gamma model, produc- 
ing, what we could term, a Poisson-gamma-normal 
model or, equivalently, a negative-binomial-normal 
model. When the gamma distribution is chosen, it is 
implicitly assumed that the components 8ij of Oi are 
independent. This is natural in many cases, in the 
sense that the bj will induce association between re- 
peated measurements, with then the 8ij taking care 
of additional dispersion. In this case, £j reduces to a 
diagonal matrix. Nevertheless, it is perfectly possi- 
ble to allow for general covariance structures. When 
a fully distributional specification would be desired, 
then one could choose, for example, multivariate ex- 
tensions of the gamma model (Gentle, 2003). 

As stated in general above, regarding the overdis- 
persion random effects, three situations could be of 
interest: (1) the random-effects Oij are independent; 
(2) they are allowed to be dependent; (3) they are 
equal to each other and hence reduce to 9^ = 9i . 

The marginal mean vector and variance-covariance 
matrix are derived in Appendix B. The existence 
of such closed forms has important implications be- 
cause they allow, for example, for explicit correla- 
tion expressions, on the one hand, and for a more 
versatile collection of estimation methods, on the 
other hand, a point to which we will return in Sec- 
tion 5. The availability of closed- form variance and 
joint-probability expressions supplements the work 
of, for example, Zeger, Liang and Albert (1988), who 
had stated that only explicit mean expressions are 
available for a limited number of generalized linear 
mixed models, other than the linear mixed model. 



Let us consider strong conjugacy in this case. The 



6 - -0 - ln[/3 Q r(a)] 



corresponding 


model element 


f(0) = 


exp^ (a — 1) In 


f{y\\ = dK) = 


exp{y In 6 — k8 


4> = 


1, 


h{6) = 


ln0, 


9(0) = 


9k, 


7 = 


((3k)-\ 


v> = 


f3n(a — 1), 


c(y,4>) = 


lny! + y hiK, 


c*(<y,ip) = 


(1 + tpj) In 7K - 







Recall that the crux behind this result is (17). 

Even though Molenberghs, Verbeke and Demetrio 
(2007) did not do so, it is fairly straightforward to 
derive the moments. Employing the moments' ex- 
pression for the standard Poisson (Johnson, Kemp 
and Kotz, 2005, page 162), the expression condi- 
tional upon the random effects is 



(26) 



E(Y*) = J2s(k,e)(e. 



ij K>ij ) i 



where S(k,£) is the so-called Stirling number of the 
second kind. Integrating (26) over the random ef- 
fects produces, without any problem, 



(27) 



1 r(«) 



cxp 



4.6 Specific Case: Bernoulli-Type Models for 
Binary Data with Logit Link 

Similar to the Poisson case in Section 4.5, a nat- 
ural binary-data counterpart to (21)-(25) is 

(28) ~ Bernoulli(7Tjj = OijKij), 

_ exp(x^ + z^) 

[ } y "l + exp(x^ + Z ^)' 

completing the specification with (23)-(25). Unlike 
in the Poisson case, closed forms for neither the 
mean nor the variance follow when normal random 
effects are present. When only over dispersion ran- 
dom effects are included, especially when they are 
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assumed to follow a beta distribution, as in Table 1, 
conjugacy applies. However, the beta distribution 
does not allow for the multiplicative invariance as 
(17), which will preclude strong conjugacy. 

When the over dispersion random effects ("11 C RjS - 
sumed to be equal, Oij = 0i, then the beta-binomial 
model would follow if no normal random effects are 
present. The same is true, by the way, for the 
compound-symmetry model generated by the hier- 
archical random-intercepts model in the Gaussian 
case. 

Explicitly considering Oij ~ Beta(a,/3), then faj = 
a/ (a + P), and 



Simple algebra then shows 



a ij — a i,jj 



a(3 



&i,jk Pijk 



(a + /3) 2 (a + /3 + l)' 
a/3 

; (a + /3) 2 (a + /3 + l)' 

Observe that there are two correlations: p^k, which 
described the correlation between draws from the 
beta distribution and (a + (3 + 1) . It is of course 
possible to let a and /3 vary with i and/or j. In such 
cases, the above and below expressions will change 
somewhat, but computations are straightforward. 

Using the general expressions, the above results 
can be used to derive approximate expressions for 
means and variance-covariance elements. For the 
special case of no normal random effects, but main- 
taining the fixed effects in (29), that is, 



(30) 



we obtain 



1 y 



exp(x^) 
l + exp(x^)' 



E(Yi. 



(31) Var(^) 



a 



a + P 



a 



Cov(Yij,Y ik ) = pij k 



a 



aft 



{a + P) 2 {a + P + l) 



If we further make exchangeability assumptions, that 
is, km = Kik = Ki and pijk = Pi, further simplifica- 
tion follows. Finally, setting Ki = 1, the conventional 
beta-binomial follows. It is then easy to derive the 
resulting binomial version by defining 



(32) 



Zi —y~]Yjj. 



E(Z l )=n 
Var(Zj) = n 



a 



a + P 
aP 



Tli-Ki, 



l + (ra; + l)- 



(a + P) 2 { ' v "' ' a + P + 1 
= riiiTi(l - m){l + (rii - I) pi}, 

with pi the beta-binomial correlation. Hence, the 
conventional beta-binomial model follows. 

In comparison to the longitudinal Poisson case, 
the longitudinal binary case appears to defeat closed- 
form solutions and strong conjugacy. However, this 
hinges on the fact that we employ the logit link. In 
spite of it being a very natural choice in the univari- 
ate case, it does not combine very nicely with normal 
random effects. Recall that this is known already 
from the GLMM framework for binary data. There- 
fore, it is sensible to study the probit link instead. 
The random-effects probit model has received some 
attention in earlier decades (Schall, 1991; Guilkey 
and Murphy, 1993; Hedeker and Gibbons, 1994; Mc- 
Culloch, 1994; Gibbons and Hedeker, 1997; Renard, 
Molenberghs and Geys, 2004), with emphasis pri- 
marily on computational schemata to deal with the 
multivariate normal integral. We will return to this 
aspect in Section 5. 

4.7 Specific Case: Bernoulli-Type Models for 
Binary Data with Probit Link 

Introducing the probit version of the model, while 
at the same time assuming that the overdispersion 
parameters are beta distributed, comes down to 



(33) 
(34) 



«i 3 - = $i(x^£ + Zybi), 
0^ ~ Beta(a, P). 



Like before, a and P could be allowed to vary with 
i and/or j. 

It now follows that the joint distribution can be 
written as (details in Appendix D) 



(35) /^(y< = l) 



with 



a 



a + P 



(36) L ni =I ni -Zi{D- x +Z[Zi 



- x z>. 



i=l 



More details on the cell probabilities, as well as on 
means and variances, can be found in Appendix D. 

It is important to note that the existence of closed- 
form expressions for the probit case opens a window 
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of opportunity for the logit case. Indeed, the well- 
known approximation formulae, linking the normal 
and logistic densities, proves useful here. As shown 
in Johnson and Kotz (1970), page 6, and used in 
Zeger, Liang and Albert (1988), 



be expressed as 



(41) 



(37) 



$i(cy), 



with c = (16\/3)/(15tt). Applied to (28)-(29), we 
find 



exp(x^ + z^.b 



(43) 



/(bi) 



1 



(2n)i/ 2 \D\ 1 / 2 



-(l^b^-ib. 



(38) 



l + exp(x^ + z^.b i ) 

%$i[c(x^ + z^.bi)]. 
Applying (38) to (35) yields 



(39)/ ni ( yi = l) 



a: 



a + /3 



■^(cX&L-i), 



with 



For the expectation, we find, based on (38) and 
(D.4) 



(40) 



a, 



with similar expressions for the variance and co- 
variance terms. Note that, upon estimating the pa- 
rameters within the probit approximation paradigm, 
back-transformation to the original logit scale is pos- 
sible, using expressions such as (38) and (40). This 
opens perspectives for alternative estimation meth- 
ods for the combined model with logit link, with the (^) f(@i) — 11 Q 
important special case of the normal- logistic GLMM. 

In the Bernoulli case, calculating the moments is 
extremely simple. Indeed, the Bernoulli moments 
are all identical. The conditional moments are all 
E(Y^\9ij,bi) = OijKij (k = 1,2, . . .). Hence, they all 
reduce to (31). In the probit case, they equal to 
(D.4). 



A few observations are in place. First, it is implicit 
that the gamma random effects are independent. 
This need not be the case and, like in the Pois- 
son case, extension via multivariate gamma distri- 
butions is possible. Second, setting p = 1 leads to the 
special case of an exponential time-to-event distri- 
bution. Third, it is evident that the classical gamma 
frailty model (i.e., no normal random effects) and 
the Weibull-based GLMM (i.e., no gamma random 
effects) follow as special cases. Fourth, owing to the 
conjugacy result of Table 1 and property (17) of the 
gamma density, strong conjugacy applies. This is 
typically considered for the exponential model, but 
it holds for the Weibull model too, merely by ob- 
serving that the Weibull model is nothing but an 
exponential model for the random variable YP-. It 
is equally possible to derive this result by merely 
rewriting the factor <f> = \k. Fifth, the above ex- 
pressions are derived for a two-parameter gamma 
density. It is customary in a gamma frailty context 
(Duchateau and Janssen, 2007) to set dj/3j = 1, for 
reasons of identifiability. In this case, (42) is replaced 
by 



l 



L (l/a^T^) 



/.V '>.;";.. 



Alternatively, assuming ay = 1 and = 1/5 j, one 
could write 



(45) 



4.8 Specific Case: Weibull- and 

Exponential-Type Models for Time-to- Event 
Data 

The general Weibull model for repeated measures, 
with both gamma and normal random effects, can 



implying that the gamma density is reduced to an 
exponential one. Closed-form expressions for the 
marginal density, means, variances, covariances and 
moments are derived in Appendix E, where also a 
number of related facts are derived. 

Of course, in this context of time-to-event data, 
further issues that deserve attention are as follows: 
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(1) censoring and how to deal with it; (2) derivation 
of related functions, such as the survivorship func- 
tion, as well as the hazard, cumulative hazard and 
intensity functions; (3) the possibility of nonpara- 
metric baseline hazard functions. These are never- 
theless not considered here. While in principle pos- 
sible, we aim at focusing on commonality between 
various GLM settings. 

4.9 Implication for Computation of Correlation 
and Derived Quantities 

Up to here, we have provided closed-form expres- 
sions for the marginal joint distributions, the mo- 
ments, and hence for means and variances, for the 
normal, Poisson, probit and Weibull cases, with a 
combination of normal random effects, on the one 
hand, supplemented, on the other hand, with conju- 
gate random effects, taking a normal, gamma, beta 
and gamma form, respectively. The obvious one miss- 
ing from the list is the logit model, but then the 
logit-probit connection, as discussed in Section 4.7, 
comes to rescue. Generally, progress is possible when- 
ever strong conjugacy applies. 

These results and the ensuing calculations are use- 
ful for a number of reasons, such as: (1) parameter 
estimation and derived inferences; (2) implementa- 
tion of estimation algorithms, as will be discussed in 
Section 5; and (3) the computation of derived quan- 
tities. 

Such derived quantities include marginal correla- 
tion coefficients, about which more detail is provided 
in the Appendix (Section F). Of course, correlations 
are not always of direct scientific interest and, when 
they are, one might not be willing to base one's 
entire model choice on whether or not closed-form 
correlations are available. That said, some consider- 
ations are in place. 

First, our results indicate that closed- form corre- 
lations exist for a number of commonly used mod- 
els, such as the Poisson-normal GLMM and the 
Weibull-gamma frailty model. Second, the same holds 
true for their extensions within our proposed model. 
Third, when studying psychometric reliability and 
generalizability (Vangeneugden et al., 2008a, 2010), 
the correlation function is the basic building block. 
Fourth, correlation functions are also used in the 
context of surrogate marker evaluation from clinical- 
trial data (Burzykowski, Molenberghs and Buyse, 
2005). 

At the same time, the one important situation 
that evades direct calculation of the marginal cor- 
relation is the logit with beta and normal random 



effects, but then the probit-logit correspondence can 
be invoked. On the one hand, the probit link can be 
used in lieu of the logit link; on the other hand, the 
calculations can be carried out on the probit scale, 
where after the results can be back transformed to 
the logit scale. 

Other key derived quantities include marginal re- 
gression parameters. Suppose, for example, that one 
is interested in estimating the marginal treatment 
effect from longitudinal clinical-trial data that are 
not normally distributed. In principle, a marginal 
model could be fitted, which oftentimes is done via 
generalized estimating equations (Liang and Zeger, 
1986). However, when data are incomplete, such mod- 
els pose specific challenges even though remedies 
have been devised, such as inverse probability weight- 
ing or a combination with multiple imputation (for 
reviews, see Fitzmaurice et al., 2009). These, how- 
ever, come with their own problems. It is then at- 
tractive to fit a GLMM, with or without additional 
random effects for overdispersion, and use the closed- 
form mean expressions to derive marginal mean func- 
tion. The estimand of interest is then E{Yi\Ti = 
1) — E{Yi\Ti = 0), where Tj is the obvious indicator 
for the treatment to which the ith subject has been 
assigned. Precision estimation then proceeds via the 
delta method. 

5. ESTIMATION 

A priori, fitting a combined model of the type de- 
scribed in Section 4 proceeds by integrating over the 
random effects. The likelihood contribution of sub- 
ject i is 

/m 
HIUHIU d.\>;.0;)/ { \>; D) 

J'=l 

Here, 1? groups all parameters in the conditional 
model for Yj. From (46) the likelihood derives as 

L(0,£>,0,£) 

JV 

(47) 

= 11 / Y[fij(yij\#M,Oi)f(bi\D) 

i=l J j=l 
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The key problem in maximizing (47) is the presence 
of N integrals over the random effects bj and 9. It is 
widely claimed that the absence of a closed-form so- 
lution precludes an analytical-integration based so- 
lution (Molenberghs and Verbeke, 2005), explain- 
ing the popularity of Taylor-series expansion based 
methods, such as PQL and MQL, Laplace approx- 
imation and numerical-integration based methods. 
These have been implemented in, for example, the 
SAS procedures GLIMMIX and NLMIXED. Several 
of the series expansion methods tend to exhibit bias, 
an issue taken up in Breslow and Lin (1995), and 
suggesting the use of alternative methods. 

However, thanks to our results in Section 4, fur- 
ther progress can be made. Closed-form integration, 
apart from the normal case, is within reach for the 
Poisson, probit and Weibull cases. Now, some closed 
forms involve series expansions, and may be either 
time consuming or cumbersome to implement. This 
notwithstanding, a variety of alternative approaches 
are possible. 

Let us turn to the Poisson case. While closed- 
form expressions can be used to implement max- 
imum likelihood estimation, with numerical accu- 
racy governed by the number of terms included in 
the series, one can also proceed by what we will term 
partial marginalization. By this we refer to integrat- 
ing (21)-(25) over the gamma random effects only, 
leaving the normal random effects untouched. The 
corresponding probability is 



(48) 



Oij + Vij ~ 1 
«j - 1 

1 

1 + 



Pi 



1 + Kijfy 



where Kij = exp[x^ £ + z^-bj]. Note that, with this 
approach, we assume that the gamma random ef- 
fects are independent within a subject. This is fine, 
given the correlation is induced by the normal ran- 
dom effects. 
Similarly, for the Weibull case we obtain 



(49) f( yij \bi 



Because there is lack of strong conjugacy, the logit 
case defies the mere exploitation of conjugate form, 
such as the negative-binomial form (48) and the 
Weibull-gamma frailty form (49). Nevertheless, it 



is easy to derive, for this case 

1 



f{yij\bu 



(50) 



.[(l_« ij .)a i + i f-™. 



For all of these, it is straightforward to obtain the 
fully marginalized probability by numerically inte- 
grating the normal random effects out of (48), (49) 
and (50), using a tool such as the SAS procedure 
NLMIXED that allows for normal random effects in 
arbitrary, user-specified models. 

The concept of partial integration always applies 
whenever strong conjugacy holds. Indeed, an expres- 
sion of the form (16) corresponds to integrating over 
the conjugate random effect 9, while leaving the nor- 
mally distributed random effect embedded in the 
predictor, k in this notation. Recall that, while ex- 
pressions of the type (16) appear to be for the uni- 
variate case, they extend without problem to the 
longitudinal setting as well. 

For the specific case of the marginalized probit 
model, the computational challenge stems from the 
presence of a multivariate normal integral of the 
form (35), a phenomenon also known from the fully 
marginally specified multivariate probit model (Ash- 
ford and Sowden, 1970; Lesaffre and Molenberghs, 
1991; Molenberghs and Verbeke, 2005). Specific to 
the context of the probit models with random ef- 
fects, Zeger, Liang and Albert (1988) derived the 
marginal mean function, needed for their applica- 
tion of generalized estimating equations fitting 
algorithm for the marginalized probit model. It is 
one of the first instances of the use of GEE to a non- 
marginally specified model. Precisely, these authors 
derive the marginal mean function and (a working 
version of) the marginal variance-covariance matrix. 
These are sufficient to implement GEE or, with ap- 
propriate extension, also second-order GEE. Note 
that our derivations yield, for strong conjugate cases 
in general, as well as for a number of particular 
cases, not only the marginal mean and variance, 
but also all moments and the entire joint distribu- 
tion. Evidently, this is plenty to implement GEE, 
but the other methods, described in this section, 
come within reach, too. 

In the same spirit, pseudo-likelihood can be used 
(Aerts et al., 2002; Molenberghs and Verbeke, 2005). 
This is particularly useful when the joint marginal 
distribution is available but cumbersome to manip- 
ulate and evaluate, such as in the probit case. This 
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is the idea followed by Renard, Molenberghs and 
Geys (2004) for a multilevel probit model with ran- 
dom effects, similar in spirit to the probit models 
considered in Section 4.7. Essentially, the joint dis- 
tribution is replaced with a product of factors of 
marginal and/or conditional distributions of lower 
dimensions. Because such a product does not nec- 
essarily recompose the original joint distribution, 
sandwich-estimator ideas are then used to provide 
not only valid point estimates, but also precision es- 
timates and inferences derived therefrom. 

Schall (1991) proposed an efficient and general es- 
timation algorithm, based on Harville's (1974) mod- 
ification of Henderson's (1984) mixed-model equa- 
tions. Hedeker and Gibbons (1994) and Gibbons 
and Hedeker (1997) proposed numerical-integration 
based methods, thus considering neither marginal 
moments (means, variances) nor marginalized joint 
probabilities. Guilkey and Murphy (1993) provide 
a useful early overview of estimation methods and 
then revert to Butler and Moffit's (1982) Hermite- 
integration based method, supplemented with Monte 
Carlo Markov Chain ideas. 

Further, one might, for example, opt for fully Bayes- 
ian inferences. Alternatively, the EM algorithm can 
be used, in line with Booth et al. (2003) for the 
Poisson case. The EM is a flexible framework within 
which either the conjugate, or the normal, or both 
sets of random effects can be considered the "miss- 
ing" data over which expectations are taken. 

Booth et al. (2003) also considered nonparametric 
maximum likelihood, in the spirit of Aitkin (1999) 
and Alfo and Aitkin (2000). In addition, ideas of hi- 
erarchical generalized linear models (Lee and Nelder, 
1996, 2001a, 2001b, 2003; Yun, Sohn and Lee, 2006; 
Lee, Nelder and Pawitan, 2006) can be employed. 

A suite of methods is available that employ trans- 
formation results, essentially based on transforming 
the nonnormal random effects to normal ones, or 
vice versa. To briefly describe these, write the con- 
tribution for subject i to the likelihood as 



(51) 



\f{Vij\ui) 



p u (ui)dv,i 



where /(•) specifies the outcome model given the 
random effects. Furthermore, £>«(•) denotes the den- 
sity of the random effect, typically nonnormal. While 
the latter random effect can be vector- valued, let us 
illustrate the method for the scalar case. To simplify 
notation further, in (51), covariates and parameter 



vectors have been suppressed from notation. Liu and 
Yu (2008) advocate a simple transformation: 



(52) Li 



a,- 



■cj)(ai) dcii, 



where now o« is a normal random effect. Evidently, 
4>(-) is the standard (multivariate or univariate) nor- 
mal density. Liu and Yu (2008) complete their argu- 
ment by stating that then the new model 

Pu(ai) 



can be subjected to the conventional quadrature 
techniques available in, for example, SAS' NLMIXED 
procedure. A number of SAS implementations for 
important particular cases are offered by these au- 
thors. Obviously, the method can be expanded to 
our situation, where apart from the nonnormal ran- 
dom effects, also normal random effects are present. 
The justification of the method simply follows by 
applying the transformation theorem at the level of 
the densities involved. The usefulness of this method 
cannot be overestimated. It is especially useful when 
partial integration is not possible, for example, when 
strong conjugacy does not hold, like in the binary 
beta-normal-logit case. 

Alternatively, Nelson et al. (2006) advocate the 
transformation 



(53) 



U: 



where F u is the cumulative distribution function 
(CDF) of Ui and <£(•) is the standard normal CDF, 
as before. Nelson et al.'s method, labeled probability 
integral transformation (PIT), comes down to gener- 
ating normal variates and then inserting these in the 
model only after transformation (53), ensuring that 
they are of the desired nature. It is tautologically 
clear that (53) automatically ensures the support of 
the variable is correctly mapped along with the vari- 
able itself. By passing through the unit interval, by 
means of <£(•), and then applying F u (-), one forces, 
for example, a gamma variable to range over the 
positive half line, a beta variable to be confined to 
the unit interval, etc., as it should. 

Lin and Lee (2008) present estimation methods for 
the specific case of linear mixed models with skew- 
normal, rather than normal, random effects. 

Quite apart from the choice of estimation method, 
it is important to realize that not all parameters 
may be simultaneously identifiable. For example, the 
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gamma-distribution parameters in the Poisson case, 
a and /?, are not simultaneously identifiable when 
the linear-predictor part is also present, because there 
is aliasing with the intercept term. Therefore, one 
can set, for example, f3 equal to a constant, removing 
the identifiability problem. It is then clear that a, in 
the univariate case, or the set of otj in the repeated- 
measures case, describe the additional overdisper- 
sion, in addition to what stems from the normal 
random effect (s). A similar phenomenon also plays 
in the binary case, where both beta-distribution pa- 
rameters are not simultaneously estimable. 

6. ANALYSIS OF CASE STUDIES 
6.1 A Clinical Trial in Epileptic Patients 

We will analyze the epilepsy data, introduced in 
Section 2.1. Note that the data were analyzed before 
in Molenberghs and Verbeke (2005), Chapter 19, 
using generalized estimating equations (Liang and 
Zeger, 1986) and the Poisson-normal model. These 
authors used a slightly different parameterization. 

Let Yij represent the number of epileptic seizures 
patient i experiences during week j of the follow-up 
period. Also, let tij be the time-point at which 
has been measured, = 1,2, ... , until at most 27. 
Let us consider the combined model (21)— (25), with 
specific choices 



(54) InK) 



(Coo + h) + £oikj, 

(£io + &i) + 6.i%) 



if placebo, 
if treated, 



where the random intercept bi is assumed to be zero- 
mean normally distributed with variance d. We con- 
sider special cases: (1) the ordinary Poisson model, 
(2) the negative-binomial model, (3) the Poisson- 
normal model, together with (4) the combined model. 
Estimates (standard errors) are presented in Ta- 
ble 2. Clearly, both the negative-binomial model and 
the Poisson-normal model are important improve- 
ments, in terms of the likelihood, relative to the or- 
dinary Poisson model. This should come as no sur- 
prise since the latter unrealistically assumes there 
is neither overdispersion nor correlation within the 
outcomes, while clearly both are present. In addi- 
tion, when considering the combined model, there 
is a very strong improvement in fit when gamma 
and normal random effects are simultaneously al- 
lowed for. This strongly affects the point and preci- 
sion estimates of such key parameters as the slope 
difference and the slope ratio. There is also an im- 
pact on hypothesis testing. The Poisson model leads 



to unequivocal significance for both the difference 
(p = 0.0008) and ratio (p = 0.0038), whereas for the 
Poisson-normal this is not the case for the differ- 
ence of the slopes (p = 0.7115), while some signifi- 
cance is maintained for the ratio (p = 0.0376). Be- 
cause the Poisson-normal is commonly used, it is 
likely that in practice one would decide in favor of 
a treatment effect when considering the slope ratio. 
This is no longer true with the negative-binomial 
model, where the p- values change to p = 0.01310 and 
p = 0.2815, respectively. Of course, one must not for- 
get that, while the negative-binomial model accom- 
modates overdispersion, the 0™ random effects are 
assumed independent, implying independence be- 
tween repeated measures. Again, this is not realistic 
and, therefore, the combined model is a more viable 
candidate, corroborated further by the aforemen- 
tioned likelihood comparison. This model produces 
nonsignificant p- values of p = 0.2260 and p = 0.1591, 
respectively. 

Thus, in conclusion, whereas the conventionally 
used and broadly implemented Poisson-normal model 
would suggest a significant effect of treatment, our 
combined model issues a message of caution, be- 
cause there is no evidence whatsoever regarding a 
treatment difference. 

Molenberghs and Verbeke (2005), Chapter 19, con- 
sidered a Poisson-normal model with random inter- 
cepts as well as random slopes in time. It is interest- 
ing to note that, when allowing for such an extension 
in our models, the random slopes improve the fit of 
the Poisson-normal model with random intercept, 
but not of the combined one with random intercept 
(details not shown). As a consequence, the combined 
model with random intercept is the best fitting one. 
At the same time, note that fitting such a model 
establishes that the presence of a conjugate random 
effect does not preclude the consideration of normal 
random effects beyond random intercepts. 

Recall that the data were analyzed, too, by Booth 
et al. (2003). While we considered four different mod- 
els, these authors focused on the Poisson-normal 
and combined implementations. There are further 
differences in actual fixed-effects and random-effects 
models considered, as well as in us further consider- 
ing inferences for differences and ratios. 

Let us now turn to the correlation functions. Given 
that the gamma random effects are assumed inde- 
pendent, we only need to consider the Poisson-normal 
and combined cases; the versions with and without 
random slopes are considered. Obviously, because 
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Table 2 

Epilepsy study. Parameter estimates and standard errors for the regression coefficients in (1) the 
Poisson model, (2) the negative-binomial model, (3) the Poisson-normal model and (4-) the combined 
model. Estimation was done by maximum likelihood using numerical integration over the normal 

random effect, if present 



Estimate (s.e.) 



i,- (r, w , . 
Jljnect 


Parameter 


Poisson 


Negative-binomial 


Intercept placebo 


£oo 


1.2662 (0.0424) 


1.2594 (0.1119) 


Slope placebo 


£oi 


-0.0134 (0.0043) 


-0.0126 (0.0111) 


Intercept treatment 


tw 


1.4531 (0.0383) 


1.4750 (0.1093) 


Slope treatment 


£u 


-0.0328 (0.0038) 


-0.0352 (0.0101) 


Negative-binomial parameter 


ai 




0.5274 (0.0255) 


Negative-binomial parameter 


a.2 — 1/ai 




1.8961 (0.0918) 


— 21og-likelihood 




-1492 


-6755 






Poisson-normal 


Combined 


Intercept placebo 


& 


0.8179 (0.1677) 


0.9112 (0.1755) 


Slope placebo 




-0.0143 (0.0044) 


-0.0248 (0.0077) 


Intercept treatment 


So 


0.6475 (0.1701) 


0.6555 (0.1782) 


Slope treatment 


& 


-0.0120 (0.0043) 


-0.0118 (0.0074) 


Negative-binomial parameter 


Ql 




2.4640 (0.2113) 


Negative-binomial parameter 


oli — 1/a-L 




0.4059 (0.0348) 


Variance of random intercepts 


d 


1.1568 (0.1844) 


1.1289 (0.1850) 


— 21og-likelihood 




-6810 


-7664 



Table 3 

Epilepsy study. Observed smallest and largest values for the correlation function, for the 
Poisson-normal and combined models, and for both treatment arms. The time pair for 
which the values are observed is shown too (RI — random intercept; RS — random slope) 



Smallest value Largest value 



Model 


Arm 


P 


Time 


pair 


P 


Time 


pair 


Poisson-normal, RI 


Placebo 


0.8577 


26 & 


; 27 


0.8960 


1 & 


; 2 


Poisson-normal, RI 


Treatment 


0.8438 


26 & 


; 27 


0.8794 


1 & 


2 


Combined, RI 


Placebo 


0.8259 


26 & 


; 27 


0.8981 


1 & 


2 


Combined, RI 


Treatment 


0.8383 


26 & 


; 27 


0.8744 


1 & 


2 


Poisson-normal, RI+RS 


Placebo 


0.2966 


1 & 


; 27 


0.9512 


26 & 


; 27 


Poisson-normal, RI+RS 


Treatment 


0.2936 


1 & 


; 27 


0.9530 


26 & 


; 27 


Combined, RI+RS 


Placebo 


0.4268 


1 & 


; 27 


0.9281 


26 & 


; 27 


Combined, RI+RS 


Treatment 


0.4225 


1 & 


; 27 


0.9329 


26 & 


; 27 



the fixed-effects structure is not constant but rather 
depends on time, we have to apply the general corre- 
lation function (F.13). In the Poisson-normal case 
with random intercepts only, and for the placebo 
group, based on the parameter estimates in Table 2, 
we obtain 

Corr (Y (f), Y(s)) = 35.58 • 0.99* +s 

/(\/(4.04 • 0.99* + 35.58 • 0.97*) 
■ ^(4.04 • 0.99 s + 35.58 • 0.97 s )), 



where Y{t) represents the outcome for an arbitrary 
subject at time t. Calculations in all other cases 
are similar. The smallest and largest values for the 
correlation functions, for both arms, for both the 
Poisson-normal and combined models, and for both 
choices of the random-effects structure are given in 
Table 3. When only random intercepts are consid- 
ered, the correlations range over a narrow interval; 
they are rather high and there is little difference 
between the Poisson-normal and combined models. 
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However, turning to the models with random inter- 
cepts and random slopes, several differences become 
apparent. First, the values exhibit a much broader 
range between their smallest and largest values. Sec- 
ond, the range is somewhat overestimated by the 
Poisson-normal model, which then narrows when 
we switch to the combined model, thereby incorpo- 
rating over dispersion effects, random intercepts and 
random slopes. Thus, the random slope allows for 
the correlation to range over a considerable inter- 
val, while the overdispersion effect avoids the range 
to be overly wide. 

Within each model, there is relatively little dif- 
ference between the placebo and treated groups, al- 
though the difference is a bit more pronounced in 
the combined model. Further, the correlation range 
within every group is relatively narrow. The most 
noteworthy feature, unquestionably, is the large dis- 
crepancy between both models. This is because the 
Poisson-normal model forces the correlation and 
overdispersion effects to stem from a single addi- 
tional parameter, the random-intercept variance d. 
Thus, considerable overdispersion also forces the cor- 
relation to increase, arguably beyond what is con- 
sistent with the data. In the combined model, in 
contrast, there are two additional parameters, giv- 
ing proper justice to both correlation and overdis- 
persion effects. It was already clear from the above 
discussion and that in Molenberghs, Verbeke and 
Demetrio (2007) that the combined model is an im- 
portant improvement. This now clearly manifests it- 
self in the correlation function, too. 

6.2 A Clinical Trial in Onychomycosis 

We will analyze the binary onychomycosis data, 
introduced in Section 2.2. For the logit, consider the 
model 

Yij\(bi) ~ Bernoulli^), 
logit(vr 4J ) =£i(l - Ti) + bi + 6(1 - Ti)Uj 

(55) 

+ ^sTi + iiTiUj, 

where Tj is the treatment indicator for subject i, tij 
is the time-point at which the jth measurement is 
taken for the ith subject, and b{ ~ iV(0, d). Parame- 
ter estimates for the logistic model, with and with- 
out the normal random effect, on the one hand, and 
with and without the beta-binomial component, on 
the other hand, as described in Section 4.6, are pre- 
sented in Table 4. Observe that the model becomes 
hard to fit when the beta random effects are present, 



which is seen from estimates and standard errors in 
both the beta-binomial model as well as the com- 
bined model. To understand this, we must observe 
that the conjugate random effects in the Bernoulli 
case, unlike in the Poisson, binomial and Weibull 
cases, cannot add to the variability, only to the cor- 
relation structure. This means that there is consid- 
erably less information available than in the other 
cases. This does not mean that the beta random ef- 
fects are unnecessary, but rather that they challenge 
the stable estimation of other model parameters. 

6.3 Recurrent Asthma Attacks in Children 

We will analyze the times-to-event, introduced in 
Section 2.3. We consider an exponential model, that 
is, a model of the form (41) with p=l, and further 
a predictor of the form 

Kij =£,o + bi + £,iTi, 

where Tj is an indicator for treatment and bi ~ iV(0, d) 
Results from fitting all four models (with/ 
without normal random effect; with/without gamma 
random effect) can be found in Table 5. A formal as- 
sessment of the treatment effect from all four models 
is given in Table 6. The treatment effect £i is sta- 
bly identifiable in all four models. As can be seen 
from Table 6, the treatment effects are similar in 
strengths, but including both random effects reduces 
the evidence, relative to the exponential model. Need- 
less to say, too parsimonious an association struc- 
ture might lead to liberal test behavior. 

6.4 The Need for the Combined Model 

We have some evidence from the above three ex- 
amples that there is a need for the combined model. 
Some indication came, for example, from the cor- 
relation functions in the epilepsy case. It is useful 
to perform formal comparison of all nested models, 
using Wald statistics, for each of the three cases. 
A summary is given in Table 7. Note that, owing 
to the familiar boundary problem that occurs when 
testing for variance components, mixtures of a Xo 
and x\ were used, instead of the conventional x\ 
(Molenberghs and Verbeke, 2007). In all three case 
studies it is clear that: (1) independence is strongly 
rejected in favor of both a model with normal ran- 
dom effects or a model with conjugate random ef- 
fects; (2) on top of one set of random effects, there is 
a clear need for the other set as well, hence provid- 
ing very strong evidence for the proposed combined 
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Table 4 

Onychomycosis study. Parameter estimates and standard errors for the regression coefficients in 
(1) the logistic model, (2) the beta-binomial model, (3) the logistic-normal model and (4) the 
combined model. Estimation was done by maximum likelihood using numerical integration over the 

normal random effect, if present 



Effect 


Parameter 


Estimate (s.e.) 
Logistic Beta binomial 


Intercept treatment A 




-0.5571 (0.1090) 


17.9714 (1482.6) 


Slope treatment A 


6 


-0.1769 (0.0246) 


5.2454 (12970.0) 


TntprppT^f trpfltrnpnt R 

1111C1\jCUL LLLdLlllCllt 1 J 




-0.5335 (0.1122) 


18.6744 (2077.13) 


Slope treatment B 




-0.2549 (0.0309) 


4.7775 (12912.0) 


Std. dev. random effect 


Vd 






Ratio 


a/P 




3.6739 (0.2051) 


— 21og-likelihood 




1812 


1980 






Logistic-normal 


Combined 


Intercept treatment A 


Co 


-1.6299 (0.4354) 


-1.6042 (4.0263) 


Slope treatment A 


Ci 


-0.4042 (0.0460) 


-6.4783 (1.4386) 


Intercept treatment B 


6 


-1.7486 (0.4478) 


-16.2079 (3.5830) 


Slope treatment B 




-0.5634 (0.0602) 


-8.0745 (1.5997) 


Std. dev. random effect 


sfd 


4.0150 (0.3812) 


60.8835 (14.2237) 


Ratio 


a//3 




0.2805 (0.0350) 


— 21og-likclihood 




1248 


1240 



model. The evidence is extremely convincing in all 7. CONCLUDING REMARKS 

three cases. j n ^ g p a p er we h ave argued that, rather than 

These findings, taken together, imply that the data choosing between normal and nonnormal random ef- 

exhibit, at the same time, within-subject correlation f ec ts, the latter often of a gamma, beta or other con- 



and overdispersion. 



jugate type, both can usefully be integrated together 



Table 5 

Asthma study. Parameter estimates and standard errors for the regression coefficients in (1) the 
exponential model, (2) the exponential-gamma model, (3) the exponential-normal model and (4) the 
combined model. Estimation was done by maximum likelihood using numerical integration over the 

normal random effect, if present 



Estimate (s.e.) 



Effect 


Parameter 


Exponential 


Exp onent ial gamma 


Intercept 




-3.3709 (0.0772) 


-3.9782 (15.354) 


Treatment effect 


6 


-0.0726 (0.0475) 


-0.0755 (0.0605) 


Shape parameter 


A 


0.8140 (0.0149) 


1.0490 (16.106) 


Std. dev. random effect 


Vd 






Gamma parameter 


7 




3.3192 (0.3885) 


— 21og-likelihood 




18,693 


18,715 






Exponential-normal 


Combined 


Intercept 


£o 


-3.8095 (0.1028) 


3.9923 (20.337) 


Treatment effect 




-0.0825 (0.0731) 


-0.0887 (0.0842) 


Shape parameter 


A 


0.8882 (0.0180) 


0.8130 (16.535) 


Std. dev. random effect 




0.4097 (0.0386) 


0.4720 (0.0416) 


Gamma parameter 


7 




6.8414 (1.7146) 


— 21og- likelihood 




18,611 


18,629 
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Table 6 

Asthma study. Wald test results for the 
assessment of treatment effect 



Model 


.Z-value 


p- value 


Exponential 


-1.5283 


0.1264 


Exponential-gamma 


-1.1293 


0.2588 


Exp onent ial-normal 


-1.2480 


0.2120 


Combined 


-1.0534 


0.2921 



into a single model, which we have termed the com- 
bined model. Our work builds upon that of Molen- 
berghs, Verbeke and Demetrio (2007), who brought 
together normal random effects to induce associa- 
tion between repeated Poisson data, and a gamma 
distributed random factor in the log-linear predictor 
to fine tune the overdispersion. Their model pro- 
duces the standard negative-binomial and Poisson- 
normal models as special cases, both when there are 
repeated measures as well as with univariate out- 
comes. 

The current paper builds upon this work, not only 
by considering other important cases, such as binary 
and time-to-event data and, for completeness, also 
the normally distributed case, but, in particular, 
by providing an encompassing framework around 
it. Wherever possible, explicit expressions for the 
marginal joint distributions are derived, as well as 
for marginal means, variances, covariances and mo- 
ments in general. This is possible in all cases, includ- 
ing the Poisson and Weibull cases, but for binary 

Table 7 

All three case studies. Wald test results for comparison of 
nested models 



Null model 



Alternative model Z-vahie p-value 



Epilepsy study 
Poisson 
Poisson 

Negative-binomial 
Poisson-normal 

Onychomycosis study 
Logistic 
Logistic 
Beta-binomial 
Logistic-normal 



Negative-binomial 
Poisson-normal 
Combined 
Combined 

Beta-binomial 
Logistic-normal 
Combined 
Combined 



20.68 
6.27 
6.10 

11.66 



<0.0001 
<0.0001 
<0.0001 
<0.0001 



Asthma study 

Exponential Exponential-gamma 
Exponential Exponential-normal 
Exponential-gamma Combined 
Exponential-normal Combined 



17.91 <0.0001 

10.53 <0.0001 

4.28 <0.0001 

8.01 <0.0001 

8.54 <0.0001 

10.63 <0.0001 

8.54 <0.0001 

3.99 <0.0001 



data the logit links defies such a closed form. How- 
ever, we showed that a switch to the probit link does 
allow for closed forms. The existence of these closed 
forms, producing expressions for a variety of gener- 
alized linear mixed models as special cases, has not 
been known to its fullest extent. We discuss their 
implications for: (1) general understanding; (2) de- 
rived quantities such as correlations, treatment ef- 
fects, etc.; and (3) the construction of parameter 
estimation and implementation. 

For the binary case, we have exploited the logit- 
probit relationship to derive probit-based closed-form 
approximations to the logit case. For the Weibull sit- 
uation, we have additionally generated a family of 
distributions that encompass an entire collection of 
Cauchy-type distributions. 

To make these developments possible in their fullest 
generality, we have introduced strong conjugacy, 
which comes down to a version of the well-known 
conjugacy that is compatible with the additional in- 
troduction of normal random effects. 

In terms of estimation, we have focused on max- 
imum likelihood estimation. This can be done by 
integrating over the random effects, either fully an- 
alytically, using the explicit expressions derived, or 
by combining analytic and numeric techniques. The 
latter has been implemented in the SAS procedure 
NLMIXED, for the Poisson, binary and survival cases, 
and applied to three case studies. 

Of course, with the considerations of not only one 
but multiple sets of random effects comes the obli- 
gation to reflect on the precise nature of such latent 
structures. As underscored by Verbeke and Molen- 
berghs (2009), full verification of the adequacy of 
a random-effects structure is not possible based on 
statistical considerations alone, because there is a 
many-to-one map from hierarchical models to the 
implied marginal model. Of course, this should not 
stop the user from considering such models, but 
rather issues a word of caution. 

A number of topics have been mentioned in this 
paper that deserve further research. These include, 
but are not limited to, the following: (1) the con- 
struction of model building and goodness-of-fit tool; 

(2) a detailed study of the relative merits of vari- 
ous estimation methods and their implementation; 

(3) a study of the identifiability of (random-effects) 
parameters in the combined model; (4) the incorpo- 
ration of censoring in the survival case; and (5) the 
explicit consideration of data types and models not 
considered here. 
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The Poisson, binary and Weibull cases have 
been implemented in the SAS procedure 
NLMIXED. All datasets, programs and outputs can 
be found in a WinZip archive on the web site 
www . censtat . be/ software. 
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SUPPLEMENTARY MATERIAL 

A family of generalized linear models for repeated 
measures with normal and conjugate random ef- 
fects: Calculation details 

(DOI: 10.1214/10-STS328SUPP; .pdf). In Section 
A, generic approximate calculations are provided. 
Closed-form calculations for various cases are of- 
fered as well: for the Poisson case (Section B), for 
the binary case with logit link (Section C), for the 
binary case with probit link (Section D), and for the 
time-to-event case (Section E). Finally, Section F is 
dedicated to the derivation of marginal correlation 
functions. 
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